105 research outputs found
Evaluación y análisis de una aproximación a la fusión sensorial neuronal mediante el uso de sensores pulsantes de visión / audio y redes neuronales de convolución
En este trabajo se pretende avanzar en el conocimiento y posibles implementaciones hardware de los mecanismos de Deep Learning, así como el uso de la fusión sensorial de forma eficiente utilizando dichos mecanismos. Para empezar, se realiza un análisis y estudio de los lenguajes de programación paralela actuales, así como de los mecanismos de Deep Learning para la fusión sensorial de visión y audio utilizando sensores neuromórficos para el uso en plataformas de FPGA. A partir de estos estudios, se proponen en primer lugar soluciones implementadas en OpenCL así como en hardware dedicado, descrito en systemverilog, para la aceleración de algoritmos de Deep Learning comenzando con el uso de un sensor de visión como entrada. Se analizan los resultados y se realiza una comparativa entre ellos. A continuación se añade un sensor de audio y se proponen mecanismos estadísticos clásicos, que sin ofrecer capacidad de aprendizaje, permiten integrar la información de ambos sensores, analizando los resultados obtenidos junto con sus limitaciones. Como colofón de este trabajo, para dotar al sistema de la capacidad de aprendizaje, se utilizan mecanismos de Deep Learning, en particular las CNN1, para fusionar la información audiovisual y entrenar el modelo para desarrollar una tarea específica. Al final se evalúa el rendimiento y eficiencia de dichos mecanismos obteniendo conclusiones y unas proposiciones de mejora que se dejarán indicadas para ser implementadas como trabajos futuros.In this work it is intended to advance on the knowledge and possible hardware implementations of the Deep Learning mechanisms, as well as on the use of sensory fusión efficiently using such mechanisms. At the beginning, it is performed an analysis and study of the current parallel programing, furthermore of the Deep Learning mechanisms for audiovisual sensory fusion using neuromorphic sensor on FPGA platforms. Based on these studies, first of all it is proposed solution implemented on OpenCL as well as dedicated hardware, described on systemverilog, for the acceleration of Deep Learning algorithms, starting with the use of a vision sensor as input. The results are analysed and a comparison between them has been made. Next, an audio sensor is added and classic statistical mechanisms are proposed, which, without providing learning capacity, allow the integration of information from both sensors, analysing the results obtained along with their limitations. Finally, in order to provide the system with learning capacity, Deep Learning mechanisms, in particular CNN, are used to merge audiovisual information and train the model to develop a specific task. In the end, the performance and efficiency of these mechanisms have been evaluated, obtaining conclusions and proposing improvements that will be indicated to be implemented as future works
Retinal ganglion cell software and FPGA model implementation for object detection and tracking
This paper describes the software and FPGA
implementation of a Retinal Ganglion Cell model which detects
moving objects. It is shown how this processing, in conjunction
with a Dynamic Vision Sensor as its input, can be used to
extrapolate information about object position. Software-wise, a
system based on an array of these of RGCs has been developed in
order to obtain up to two trackers. These can track objects in a
scene, from a still observer, and get inhibited when saccadic
camera motion happens. The entire processing takes on average
1000 ns/event. A simplified version of this mechanism, with a mean
latency of 330 ns/event, at 50 MHz, has also been implemented in
a Spartan6 FPGA.European Commission FP7-ICT-600954Ministerio de Economía y Competitividad TEC2012-37868-C04-02Junta de Andalucía P12-TIC-130
Live Demonstration: Retinal ganglion cell software and FPGA implementation for object detection and tracking
This demonstration shows how object detection and
tracking are possible thanks to a new implementation which
takes inspiration from the visual processing of a particular type
of ganglion cell in the retina
Efficient DMA transfers management on embedded Linux PSoC for Deep-Learning gestures recognition: Using Dynamic Vision Sensor and NullHop one-layer CNN accelerator to play RoShamBo
This demonstration shows a Dynamic Vision Sensor able
to capture visual motion at a speed equivalent to a highspeed
camera (20k fps). The collected visual information is presented as
normalized histogram to a CNN accelerator hardware, called
NullHop, that is able to process a pre-trained CNN to
play Roshambo against a human. The CNN designed for this
purpose consist of 5 convolutional layers and a fully connected
layer. The
latency for processing one histogram is 8ms. NullHop is deployed
on the FPGA fabric of a PSoC from Xilinx, the Zynq 7100, which
is based on a dual-core ARM computer and a Kintex-7 with 444K
logic cells, integrated in the same chip. ARM computer is running
Linux and a specific C++ controller is running the whole
demo. This controller runs at user space in order to extract the
maximum throughput thanks to an efficient use of the AXIStream,
based of
DMA transfers. This short delay needed to process one
visual histogram, allows us to average several consecutive
classification
outputs. Therefore, it provides the best estimation of the symbol
that the user presents to the visual sensor. This output is then
mapped to present the winner symbol within the 60ms latency
that the brain considers acceptable before thinking that there is a
trick.Ministerio de Economía y Competitividad TEC2016-77785-
Neuromorphic Approach Sensitivity Cell Modeling and FPGA Implementation
Neuromorphic engineering takes inspiration from biology to
solve engineering problems using the organizing principles of biological
neural computation. This field has demonstrated success in sensor based
applications (vision and audition) as well in cognition and actuators.
This paper is focused on mimicking an interesting functionality of the
retina that is computed by one type of Retinal Ganglion Cell (RGC).
It is the early detection of approaching (expanding) dark objects. This
paper presents the software and hardware logic FPGA implementation
of this approach sensitivity cell. It can be used in later cognition layers as
an attention mechanism. The input of this hardware modeled cell comes
from an asynchronous spiking Dynamic Vision Sensor, which leads to an
end-to-end event based processing system. The software model has been
developed in Java, and computed with an average processing time per
event of 370 ns on a NUC embedded computer. The output firing rate
for an approaching object depends on the cell parameters that represent
the needed number of input events to reach the firing threshold. For the
hardware implementation on a Spartan6 FPGA, the processing time is
reduced to 160 ns/event with the clock running at 50 MHz.Ministerio de Economía y Competitividad TEC2016-77785-PUnión Europea FP7-ICT-60095
Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs
Deep learning has significantly advanced the state of the
art in artificial intelligence, gaining wide popularity from both industry
and academia. Special interest is around Convolutional Neural Networks
(CNN), which take inspiration from the hierarchical structure
of the visual cortex, to form deep layers of convolutional operations,
along with fully connected classifiers. Hardware implementations of these
deep CNN architectures are challenged with memory bottlenecks that
require many convolution and fully-connected layers demanding large
amount of communication for parallel computation. Multi-core CPU
based solutions have demonstrated their inadequacy for this problem
due to the memory wall and low parallelism. Many-core GPU architectures
show superior performance but they consume high power and also
have memory constraints due to inconsistencies between cache and main
memory. OpenCL is commonly used to describe these architectures for
their execution on GPGPUs or FPGAs. FPGA design solutions are also
actively being explored, which allow implementing the memory hierarchy
using embedded parallel BlockRAMs. This boosts the parallel use
of shared memory elements between multiple processing units, avoiding
data replicability and inconsistencies. This makes FPGAs potentially
powerful solutions for real-time classification of CNNs. In this
paper both Altera and Xilinx adopted OpenCL co-design frameworks
for pseudo-automatic development solutions are evaluated. A comprehensive
evaluation and comparison for a 5-layer deep CNN is presented.
Hardware resources, temporal performance and the OpenCL architecture
for CNNs are discussed. Xilinx demonstrates faster synthesis, better
FPGA resource utilization and more compact boards. Altera provides
multi-platforms tools, mature design community and better execution
times.Ministerio de Economía y Competitividad TEC2016-77785-
Spiking row-by-row FPGA Multi-kernel and Multi-layer Convolution Processor.
Spiking convolutional neural networks have become
a novel approach for machine vision tasks, due to the latency
to process an input stimulus from a scene, and the low power
consumption of these kind of solutions. Event-based systems only
perform sum operations instead of sum of products of framebased
systems. In this work an upgrade of a neuromorphic
event-based convolution accelerator for SCNN, which is able to
perform multiple layers with different kernel sizes, is presented.
The system has a latency per layer from 1.44 μs to 9.98μs for
kernel sizes from 1x1 to 7x7
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times
System based on inertial sensors for behavioral monitoring of wildlife
Sensors Network is an integration of multiples
sensors in a system to collect information about different
environment variables. Monitoring systems allow us to
determine the current state, to know its behavior and
sometimes to predict what it is going to happen. This work
presents a monitoring system for semi-wild animals that
get their actions using an IMU (inertial measure unit) and
a sensor fusion algorithm. Based on an ARM-CortexM4
microcontroller this system sends data using ZigBee
technology of different sensor axis in two different
operations modes: RAW (logging all information into a SD
card) or RT (real-time operation). The sensor fusion
algorithm improves both the precision and noise
interferences.Junta de Andalucía P12-TIC-130
Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator
Many FPGAs vendors have recently included embedded
processors in their devices, like Xilinx with ARM-Cortex
A cores, together with programmable logic cells. These devices
are known as Programmable System on Chip (PSoC). Their ARM
cores (embedded in the processing system or PS) communicates
with the programmable logic cells (PL) using ARM-standard AXI
buses. In this paper we analyses the performance of exhaustive
data transfers between PS and PL for a Xilinx Zynq FPGA
in a co-design real scenario for Convolutional Neural Networks
(CNN) accelerator, which processes, in dedicated hardware, a
stream of visual information from a neuromorphic visual sensor
for classification. In the PS side, a Linux operating system is
running, which recollects visual events from the neuromorphic
sensor into a normalized frame, and then it transfers these
frames to the accelerator of multi-layered CNNs, and read results,
using an AXI-DMA bus in a per-layer way. As these kind of
accelerators try to process information as quick as possible, data
bandwidth becomes critical and maintaining a good balanced
data throughput rate requires some considerations. We present
and evaluate several data partitioning techniques to improve the
balance between RX and TX transfer and two different ways
of transfers management: through a polling routine at the userlevel
of the OS, and through a dedicated interrupt-based kernellevel
driver. We demonstrate that for longer enough packets,
the kernel-level driver solution gets better timing in computing a
CNN classification example. Main advantage of using kernel-level
driver is to have safer solutions and to have tasks scheduling in
the OS to manage other important processes for our application,
like frames collection from sensors and their normalization.Ministerio de Economía y Competitividad TEC2016-77785-
- …